13. Numerical Features & Feature Column API
Transform/Preprocess Numerical Features with Feature Column API
ND320 AIHCND C01 L03 A11 Transform Preprocess Numerical Features With Feature Column API
TensorFlow Dataset API
The TensorFlow Dataset (tf.data
) API helps to build a flexible and efficient input pipeline that can deliver data to execute training steps. The pipeline helps aggregate and batches the data from various sources. In simple words, it makes an easy loading of the dataset.
We have introduced TensorFlow Dataset API because it is particularly helpful when the amount of data is enormous, available in different data-formats in a distributed file system, and requires some transformations while loading.
TensorFlow Feature Column API Key Points
The TensorFlow Feature Column API helps make data preprocessing easier by abstracting away some of the work for things like normalization in numerical features. If you have done this type of work in Scikit Learn or Pyspark, you might appreciate the work this API does for you when it comes to preparing features for modeling. It also has the ability to add less common features like cross features and shared embeddings.
Additional Resources
Numerical Features and TensorFlow Feature Columns API
To use the TensorFlow Feature Columns with numerical features we need to do the following:
- Identify the fields with numerical features.
- Use the TensorFlow Dataset API to load the dataset.
- Create your own custom normalizer function like a z-score
def z_score_normalizer(args):
return z_score_normalization
- Use the TensorFlow numeric_column feature and pass in the z_score_normalizer function to the normalizer_fn argument.
tf.feature_column.numerical_column(column_name, normalizer_fn=z_score_normalizer)
- Let the TensorFlow Feature Column API do it's magic!
Additional Resources
Code
If you need a code on the https://github.com/udacity.
Numerical Features
SOLUTION:
- You must identify the fields with numerical features.
- You should use the TensorFlow Dataset API to convert the dataset to Tensorflow tensors for the TF Feature Column API.